-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
R21C: Revise distribution: add slurm-array; rev stats #296
Conversation
Label error. Requires at least 1 of: 0 diff, 0 diff trivial, Non 0-diff, 0 diff structural, 0-diff trivial, Not 0-diff, 0-diff, automatic, 0-diff uncoupled. Found: |
Label error. Requires at least 1 of: 0 diff, 0 diff trivial, Non 0-diff, 0 diff structural, 0-diff trivial, Not 0-diff, 0-diff, automatic, 0-diff uncoupled. Found: |
1 similar comment
Label error. Requires at least 1 of: 0 diff, 0 diff trivial, Non 0-diff, 0 diff structural, 0-diff trivial, Not 0-diff, 0-diff, automatic, 0-diff uncoupled. Found: |
Label error. Requires at least 1 of: 0 diff, 0 diff trivial, Non 0-diff, 0 diff structural, 0-diff trivial, Not 0-diff, 0-diff, automatic, 0-diff uncoupled. Found: |
1 similar comment
Label error. Requires at least 1 of: 0 diff, 0 diff trivial, Non 0-diff, 0 diff structural, 0-diff trivial, Not 0-diff, 0-diff, automatic, 0-diff uncoupled. Found: |
…ndling of post_egcm
Label error. Requires at least 1 of: 0 diff, 0 diff trivial, Non 0-diff, 0 diff structural, 0-diff trivial, Not 0-diff, 0-diff, automatic, 0-diff uncoupled. Found: |
Label error. Requires at least 1 of: 0 diff, 0 diff trivial, Non 0-diff, 0 diff structural, 0-diff trivial, Not 0-diff, 0-diff, automatic, 0-diff uncoupled. Found: |
Label error. Requires at least 1 of: 0 diff, 0 diff trivial, Non 0-diff, 0 diff structural, 0-diff trivial, Not 0-diff, 0-diff, automatic, 0-diff uncoupled. Found: |
Label error. Requires at least 1 of: 0 diff, 0 diff trivial, Non 0-diff, 0 diff structural, 0-diff trivial, Not 0-diff, 0-diff, automatic, 0-diff uncoupled. Found: |
1 similar comment
Label error. Requires at least 1 of: 0 diff, 0 diff trivial, Non 0-diff, 0 diff structural, 0-diff trivial, Not 0-diff, 0-diff, automatic, 0-diff uncoupled. Found: |
Just for the record, I will state here that after conversation with @elakkraoui @rlucches and @sdrabenh it was agreed that the ensemble diagnostic will be written at the PREDICTOR part of the 12-hour IAU integration - just as it is being written the spin up period (with fix to produce the aerosol output). Since only fields in the predictor part are desirable, I am cutting down the output size and having history only write out the derivable fields during the predictor part as opposed to what is happening in the spin up period where a snap shot is also produced during the corrector part. These are not needed. |
@mathomp4 Hi Matt, can you help me get this in? - going into R21C projected branch. Thank you. |
@rtodling I approved it and undrafted it. Feel free to merge at your pleasure. |
This branch brings in the revised ensemble parallelization options implemented in develop (aimed at FP).
The only two procedures I am enabling slur-arrays are the atmens_recenter and post_egcm. I have worked so that the old-style parallelization works again for other tasks, such as observer and gcm calls; I have reset the DST variables in the AtmEnsConfig files to parallelize the ensemble as in the past.
In doing so, I also stumbled on the changes that had been made in atmens_stats.csh; I believe there was a bit more hardwiring in the changes than need be, so I revised things according. Particularly noticeable are:
I see no need to do if related to creating time stamp of files to be handled (i.e., variables timetagz can be built automatically).
also, I see no need to have multiple mp_statsXX.rc files controlling different streams; at least not for the reason these extra files were being used for - the simply had different number of levels. Turns out it is easy to get the number of levels on the fly and edit the standard mp_stats.rc also on the fly.
A third thing has been there for a while, was the fact that hidden files associated w/ the successful termination of specific execs were being created in children directories rather than the parent directory. I revised this in develop, to follow the overall strategy in the ensemble design to place the hidden files change in the parent directory.
The other thing I noticed in the stats is that mp_statsXX.rc is using 48 PEs to run the stats ... I think that's way too many PEs and causes the jobs to sit too long in the queue. I reduced this what what the x-exp setting is, namely, 4 PEs.
Now, a changed (addition) has been placed in the atmens_stats in the R21C branch that calculates the mean and variance of fields in the ensdiag. That's good - this is something that had been missing - but there are a few issues w/ this:
5a) When the data in ensdiag is handled, hidden files get placed in the child directory (ensdiag) - I will rrevise this to that all hidden files are placed in the parent dir.
5b) I noticed, that mean and variance are not calculated for all files in the ensdiag/memMMM. The reason for this is that the post_egcm is somewhat hardwired to handle files in the background period of the model integration. This is easy to unwire (I am working on it), and allow for stats to be calculated for all output in ensdiag.
5c) HOWEVER: it should be noticed that the present spin up period (and a little of the actual streams) are calculating stats for diagnostic files from the GCM during the predictor period - which is not the GMAO convention: which is to produce products within the corrector phase of the integration. I can address this when tackily this item here (but, @elakkraoui , we need to talk and reach an agreement on this).
6a) the aerosol diagnostic is an instantaneous file stream but is named as tagv in the post_egcm file controlling output, so no mean and variance files were being calculated for this stream.
6b) the stream int_tavg_6hr_glo_L288x181_slv was coming out at an incorrect time because the stream in history was missing the entry for ref_time.
I have addressed the issues in (5) above; this required changing the API of post_egcm and changes to atm_ens.j - all minor, and all related to diagnostics. Now mean and variance are calculated for all diagnostics desired; notice that I introduced a post_egcm_diag.rc file that controls the diag statistics - this was done to avoid internal (unnecessary) logic inside post_egcm.